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—— Abstract 

We study the satisfiability problem of symbolic tree automata and decompose it into the satisfiability 
problem of the existential first-order theory of the input characters and the existential monadic 
second-order theory of the indices of the accepted words. We use our decomposition to obtain tight 
computational complexity bounds on the decision problem for this automata class and an extension 
that considers linear arithmetic constraints on the underlying effective Boolean algebra. 
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[I Introduction 


The purpose of this paper is to expose certain analogy that can be made between the 
computational complexity analysis of the decision problem of symbolic tree automata and the 
decision problem of a class of logical structures known as power structures in model theory. 

The notion of symbolic automata appeared first in [41]. But it was not until [39] that 
this class of automata regained attention. Symbolic automata have been used in a variety of 
applications including the analysis of regular expressions [8,39], string encoders [15, 24, 25], 
functional programs [12], code generation, parallelisation [33] and symbolic matching [34]. 
The specific subclass of symbolic tree automata (STAs) was later studied in the sequence of 
publications [9, 12,21, 26,37]. 

Several theoretical investigations have been carried out on computational aspects of the 
symbolic automata model, including [2,8,36]. In particular, the authors of [38] observed 
that such an automata model had been studied previously by Bès in [5]. In his paper, Bès 
introduced a class of multi-tape synchronous finite automata whose transitions are labelled 
by first-order formulas. He then proved various properties of the languages accepted by such 
automata including closure under Boolean, rational, and the projection operations, logical 
characterisations in terms of MSO logic and the Eilenberg-Elgot-Shepherdson formalism 
[16] as well as decidability properties. Remarkably, the paper showed that the notion of 
recognisability for such automata coincides with that of definability for certain generalised 
weak powers, first-studied by Feferman and Vaught [18]. In the concluding remarks of his 
work, Bès noted that “all results in this paper can be extended to the case of infinite words as 
well as (in)finite binary trees, by relying on classical decidability results for MSO theories”. 

The techniques of Feferman and Vaught allow decomposing the decision problem for 
the first-order theory of a product of structures’ , Th([], Mj) into the first-order theory 
of the structures M;, Th(M,), and the monadic second-order theory of the index set J, 
Th™°"((,...)), where the structure (J,...) may contain further relations such as a finiteness 


1 The notion of the theory of a structure for first-order and monadic second-order theories is standard in 
model theory. We refer the interested reader to the book of Hodges [23] for the details. 
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predicate, a cardinality operator, etc. If the theory of the components Th(M,;) is decidable 
for each i € I, then the decision problem reduces to that of the theory Th™°"((I,...)). To 
analyse these structures, Feferman and Vaught extended results going back to L6wenheim, 
Skolem and Behmann [3,31,35]. Technically, the decomposition is expressed in terms of 
so-called reduction sequences. 

It is known [13] that many model-theoretic constructions incur in non-elementary blow-ups 
in the formula size. This includes the case of the size of the Feferman-Vaught reduction 
sequences in the case of disjoint unions. Perhaps for this reason, no computational complexity 
results have been obtained for the theory of symbolic tree automata and related models. 
Instead, the results in the literature [10, 11,20] refer to the decidability of the satisfiability 
problem of the monadic predicates or provide asymptotic run-times rather than a refined 
computational complexity classification, which could help to evaluate the speed and scale to 
which we can hope to solve the satisfiability problem. 

As a main contribution, we show how to reduce the satisfiability problem for symbolic 
tree automata to the satisfiability problem of the existential first-order theory of the input 
characters and the existential monadic second-order theory of the indices. This decomposition 
allows us to derive tight complexity bounds for the decision problem of the automaton in 
the precise sense of Corollary 22. We then study an extension of the formalism of symbolic 
tree automata which also imposes linear arithmetic constraints on the cardinalities of the 
Venn regions of the underlying effective Boolean algebra. In particular, this extension allows 
expressing the number of occurrences of a particular kind of letter in a word. We show in 
Corollary 25 that the computational complexity of the corresponding satisfiability problem 
is the same as the one for the simpler model without cardinalities. Similar extensions for 
related models of automata are considered in the literature on data words [19]. 

Organisation of the paper. Section 2 introduces symbolic tree automata. Section 3 
gives a preliminary Feferman-Vaught decomposition of symbolic tree automata in terms of 
the theory of the elements and the theory of the indices. Section 4 describes the decision 
procedure with which, in Section 5, after presenting the quantifier-free theory of Boolean 
algebra with Presburger arithmetic, we obtain the tight complexity bounds announced. 
Section 6 describes the extension of symbolic tree automata that uses linear arithmetic 
constraints over the cardinalities of the automaton’s underlying effective Boolean algebra and 
proves the corresponding upper bounds for the associated satisfiability problem. Section 7 
concludes the paper. 


[Z Symbolic Tree Automata (STA) 


In this section, we introduce the automata model that we will study in the rest of the paper. 
Unlike traditional tree automata, symbolic automata read input characters over a not 
necessarily finite domain, which has the structure of a Boolean algebra of sets defined by a 
family of monadic predicates. To ensure compatibility with the Boolean algebra operations, 
the family of monadic predicates defining the sets needs to be closed under propositional 
operations and contain formulae denoting the empty set and the universe. Furthermore, the 
definition requires that checking satisfiability of the monadic predicates is decidable. In later 
sections, we will refine this assumption with different complexity-theoretic bounds. 


> Definition 1 ( [11]). An effective Boolean algebra A is a tuple 
(Ð, Y, [-], i, T; V, A, =) 


where D is a set of domain elements, Y is a set of unary predicates over D that are closed 
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under the Boolean connectives, with L,T € Y and [-] : V — 2? is a function such that 
1. [L] =9, 2. [T] = Ð, and 3. for all Y, 1, Y2 E€ Y, we have that a. [v1 V Y2] = [v1] U [v2] 
b. [y1 A de] = [Xi] 9 v2] c. [AY] = D\[v]. 4. Checking Jy] #0 is decidable. A predicate 


Y E€ WU is atomic if it is not a Boolean combination of predicates in Y. 


We will give now two examples of the notion of effective Boolean algebra. From the 
various examples in the literature, we choose one that matches one of our initial motivations: 
to generalise the complexity results obtained for array theories in [1,32]. We observe that the 
notion of SMT algebra in [11, Example 2.3] precisely corresponds to the language introduced 
in [82, Definition 5] but omitting cardinality constraints. We take this as a first example of 
effective Boolean algebra. 


> Example 2. The SMT algebra for a type 7 is the tuple (D, Y, [-], 1, T,V,A, =) where D 
is the domain of 7, W is the set of all quantifier-free formulas with one fixed free variable of 
type 7, [-] maps each monadic predicate to the set of its satisfying assignments, L denotes 
the empty set (which can be represented by the formula x 4 x), T denotes the universe D 
(which can be represented by the formula x = x) and V,^,~ denote the Boolean algebra 
operations of union, intersection, and complement respectively (which can be represented by 
the propositional operations on quantifier-free formulae). 


In applications, it is often useful to consider effective Boolean algebras whose generating 
monadic predicates use particular representations of formulae. In particular, Example 2 
can be contrasted with other representations of the monadic predicates, which consider 
implementation details. An example of the latter is the BDD effective Boolean algebra 
described in [9] which assumes that the set of elements of the underlying domain are expressed 
using binary decision diagrams [4]. 


> Example 3. The BDD algebra B = (N, Y, [-], L, T, |, &,*) has the set of natural numbers 
N as its universe and W is the Boolean closure of BDDs 6; such that [6;] is the set of natural 
numbers such that the i-th bit of n in binary representation is one, L denotes the BDD 
representing the empty set, T denotes the BDD representing the universal set and |, &,* 
denote the Boolean algebra operation of union, intersection, and complement. For instance, 
B3 ^ Bo denotes the set of numbers matching the binary bit-pattern ...1--0, which is satisfied 
by 8, 24,.... 


We now introduce the automata model we will investigate in the paper. As in [37], we 
will assume that our automata read binary trees. 


> Definition 4. A binary S-tree is a function T : A— X where A is a finite subset of {0,1}* 
closed under the initial segment relation (i.e. if uv € A then u € A). XF is the class of all 
binary S-trees. A the function with domain Q also known as the empty tree. For a € © and 
T, T! € EË, o[r,7'] is the D-tree with root o, left subtree T and right subtree 7’. 


The crucial difference with traditional tree automata comes in the definition of the 
transition relation which occurs at two levels: the symbolic level, at which we only consider 
the particular formula that is satisfied, and the concrete level in which we also consider the 
input character from the effective Boolean algebra that satisfies the predicate. 


> Definition 5 ( [37]). A symbolic tree automaton (STA) is a tuple 


M= (A, Q, go, F, A) 
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where 1. A is an effective Boolean algebra. 2. Q is a finite set of states. 3. qo E€ Q is the 
initial state. 4. F C Q is the set of final states. 5. A C Q x WVU, x Qx Q is a finite set of 
transitions. 

A symbolic transition p = (q1, Y, q2,93) € A, also denoted (q1, q2) is q3, has source states 
qı and qo, target state qo, and guard y. Ford € D, the concrete transition (q1, q2) Z q3 
denotes that there exists a symbolic transition (q1, q2) = q3 E A such that d € [y]. 

The language of M at state q € Q, denoted by L4(M), is the smallest subset of D# 
such that 1. ifq E€ F then A € L(M), 2. if (q1, Y,q2,q3) € A, d € [Y] and for i € {1,2}, 
Ti E Lq (M), then d[t1, T2] € Laa (M). The language of M is L(M) = La (M). 

We next give examples of automata running over the algebras of Example 2 and 3. 
> Example 6 ( [37]). We consider the language of linear arithmetic over the integers. We 
set three formulae Y>o(x) = x > 0, veo(x) = x < 0, W=0(x) = x = 0 satisfied by all positive 
integers, all negative integers and zero, respectively. The symbolic tree automaton with 
states Groot; q—, q0, q+, qe, final states qe, initial state qroot and transitions 


(q-, q+, Y=0; Groot) (q+, q-, Y=0, 40) (qe, qe, P=05 q0) 
(q-, do, Y<o, q- ) (de, de, Y<0, q-) 
(qo, d+, Y>0, 4+) (de, qe, V>0, 44) 


accepts all trees such that the root has a label 0, its left son is a — node and its right son is 
a + node, every — node has a negative label and is either a leaf or its left son is a — node 
and its right son is a 0 node. Similarly, every + node has a positive label and is either a 
leaf or its right son is a + node and its left son is a 0 node. For example, the following tree 
would be accepted: 


NAKA 
AÀ 


E € 


> Example 7 ( [9]). We consider the language of the BDD algebra in Example 3. The 
following symbolic tree automaton accepts all trees whose labels represent integers such that 
whenever the i-th bit of such integer in binary representation is one, then the j-th bit of the 
integer in binary representation is also one. The automaton has a single state q and a single 
transition rule (q, q, Bi|8;, q): 


I3 Decomposition through Shared Set Variables 


In this section, we start with a symbolic tree automaton M = (A, Q, qo, F, A) and denote by 
U1,.--,Wr,--. the atomic predicates in the underlying effective Boolean algebra A. Our first 
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observation is that the definition of symbolic tree automaton allows assuming that the set of 
these predicates is finite. 


> Lemma 8. There exists a symbolic tree automaton M' = (A’',Q,qo,F,A) such that 
L(M) = L(M’) and the cardinality of Uy is finite. 


Proof. By definition, the automaton M has a finite number of transitions. We take Y4 to 
be the Boolean closure of the predicates occurring in these transitions. It follows that Y4 is 
a finite set. We define the remaining components of A’ as those in the definition of A. Since 
the structure of the automaton is unchanged under this transformation, it follows that the 
two languages are equal, i.e. £(M) = L(M’). < 


From Lemma 8, it follows that without loss of generality we may assume that Y 4 is finite. 
Thus, the set of atomic predicates (see Definition 1) is finite too. In the remaining of the 
paper, we will work under this assumption, and we will write ¢1,...,@, for the generators of 
the effective Boolean algebra used by the symbolic finite automaton M. Similarly, we will 
write U1,...,Wm for the actual predicates used in the transitions of M. We will decompose 
the study of £(M) into the study of the properties of the input characters in D and the 
indexing properties induced by the transition structure of the automaton. Both kinds of 
properties will refer to variables representing sets of indices, to stay synchronised with each 
other. This methodology for combining theories had been previously studied in [42]. 

To specify the properties of the input characters in D, we use set interpretations of the 
form: 


k 
/\ Si = {n € {0,1} | oi(d(n)) } = [oa] (1) 


where d(n) is the element occurring at position n in the tree d € D#. These sets can be 
pictured via a Venn diagram of interpreted sets, such as the one in Figure 1. Each formula in 
Ų 4 corresponds to a particular Venn region in this diagram and can be referred to using a 
Boolean algebra expression on the variables S1,..., Sp, thanks to the set interpretation (1). 


A concrete transition (q, q2) bd q3 requires a value d € D. This value will lie in some 
elementary Venn region of the diagram in Figure 1, i.e. in a set of the form se * Vesa s x 
where 6 = ((1,..., Gr) € {0,1}*, 9° := S° and St := S. We will denote such Venn region 
with the bit-string 8. To specify the transition structure of the automaton, what is relevant 
to us is the region of the Venn diagram, not the specific value that it takes there. Thus, we 
can relabel the transitions of the automaton by the propositional formulae corresponding to 
the monadic predicates they held originally. 


> Example 9. In Example 6 the labelling predicates wa, Y<o and Yso would be replaced by 
propositional formulae $1, S2 and S3. Similarly, if in Example 7, we take as atomic formulae 
the predicates 6; then the formula 8;| 8; corresponds to the propositional formula =S; V Sj. 


It follows that a run of the automaton can be encoded as a tree of bit-strings T : A C 
{0,1}* — {0,1}* (with A prefix-closed) and that these bit-strings only need to satisfy 
the propositional formulae corresponding to the predicates labelling the transitions of the 
automaton. Figure 2 represents one such run over an uninterpreted Venn diagram. 

We denote by L1,..., £m such propositional formulae and by M(L1,..., Lm) the set 
of bit-string trees accepted by M, which we call tree tables following the terminology of 
Kleene [28]. Lemma 10 observes that the language L(M) can be expressed in terms of set 
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a 
(SZ) 


E Figure 1 A Venn diagram representing a finitely generated effective Boolean algebra with atomic 
predicates $1, 2 and ¢3. 


interpretations of the form (1) and the condition: 


k 
Ir € M(L1,...,Lm). N $i = {nr € {0,1}* | rn) } (2) 


where 7;(n) denotes the i-th bit of 7 at position n € {0,1}*. 

This is a Feferman-Vaught decomposition in the sense that we explain next. Observe 
first that following the automata-logic connection discovered by Biichi [6] and extended by 
Doner [14], the tuple of sets (51,...,5;) definable in (2) are precisely those tuples of sets 
definable in weak second order logic of two successors. Thus, what changes in the expression 
of formula (2) is the particular representation of these relations. Lemma 10 decomposes 
the satisfiability problem of symbolic tree automata into the satisfiability problem of the 
existential first-order theory of the input characters and the satisfiability problem of a certain 
representation of the monadic second-order theory of two successors. 


> Lemma 10 (Feferman-Vaught decomposition for STAs). 


£(M) = {a e D*|4r € M(Li,..., Lm). 


k 
NS: = {ne {0,1}* | di(d(m)) } = {n € {0,1}* | ri(n)} } 
i=1 
Proof. The proof uses the definition of L(M) and M(L1,..., Lm). For the left to right 
inclusion, one defines T using the membership of the values d(i) in the elementary Venn 
regions ĝ;. For the right to left inclusion, the definition of M (L1, ..., Lm) ensures that there 
is an accepting run of M corresponding to the value d thanks to the interpretations of the 
sets S;. < 


It is important to note that both sides of the equality in Lemma 10 use essentially the 
same number of bits in their description, since the set M(L1,..., Lm) can be described by 
the automaton with propositional labels or, if preferred, an equivalent regular expression. 
Thus, the complexity of the non-emptiness problem for both sets is the same. 

In the next sections, we make use of this decomposition to devise a decision procedure 
for symbolic tree automata, which, will refine the existing computational complexity results 
for the corresponding satisfiability problem. 


‘4. Decision Procedure for Non-Emptiness 


> Definition 11. The non-emptiness problem for a symbolic tree automaton M is the problem 
of determining whether L(M) # 0. 
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Sı S2 1 I 
1 1 

A A 

1) [1] fO 0 

o) jo 10 1 

S3 o ji} u 1 


™ Figure 2 A tree table accepted by a symbolic tree automaton represented over an uninterpreted 
Venn diagram (left) and as a bit-string tree (right). According to Doner’s interpretation, the sets 
A, B,C are A = {0,1,00,01}, B = {¢€,0,1,11} and C = {1,01, 10,11}. 


By Lemma 10, checking non-emptiness of the language of a symbolic finite automaton 
reduces to checking whether the following formula is true: 


k 
3S1,- , Sk -Id. A Si = {n € {0,1} | gi(d(n)) }A 


k 
T E€ M(L1,.-.,Lm). N Si = {n € {0,17 Irin) } (3) 


To establish the complexity of deciding formulae of the form (3), we will have to analyse 
further the set M(L1,..., Lm). Each tree table 7 in M(L1,..., Lm) corresponds to a symbolic 
table s whose entries are the propositional formulae that the bit-strings of 7 satisfy. More 
generally, these symbolic tables are generated by the symbolic automaton obtained by 
replacing the predicates of the symbolic automaton by propositional formulae. The set of 
symbolic tables accepted by the automaton M is a regular tree set and will be denoted by 
Ms(Li,..., Lm). 


> Example 12. The automaton in Example 7 corresponds, according to Example 9, to the 
following symbolic automaton: 


To find the computational complexity of deciding formulae of the form (3), consider first 
the case where the propositional formulae L1,..., Lm for the automaton M denote disjoint 
Venn regions. In such case, checking the satisfiability of formula (3) reduces to determining 
whether there exists a symbolic tree table s such that whenever the number of times a certain 
propositional letter occurs is non-zero, the corresponding Venn region interpreted according 
to (1) has a satisfiable defining formula. From this observation follows that our decision 
procedure will need to compute the so-called Parikh image of the regular tree language 
Ms(Li,..., Lm). 
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> Definition 13 (Parikh Image). 
The Parikh image of Ms(Lı,..., Lm) is the set 


Parikh(Ms(Ly,..-,Lm)) = {(lslz,,--->|Slz,,)/¢ E€ Ms(Li,.-., Lm)} 


where |s|z, denotes the number of occurrences of the propositional formula L; in the symbolic 
tree table s. 


To compute the Parikh image of a regular tree language, we will use a key observation by 
Klaedtke and Rue’ [27] that allows to reduce the problem to that of computing the Parikh 
image of a context-free grammar. 

First, note that the tree language Ms(L1,..., Lm) is given by a non-deterministic bottom- 
up tree automaton (this follows from Definition 5). However, the construction of Klaedtke 
and Ruef is given by non-deterministic top-down tree automata. 


> Definition 14. A non-deterministic top-down tree automaton is a tuple A = (Q, £, ô, qo, F), 
where 


- Q is a finite set of states. 

- È is an alphabet. 

- 8: QX E > P(Q x Q) is the transition function. 
- qo ts the initial state. 

- F CQ is the set of final states. 


Associated with A is the function 6 : S# — S defined by 6(A) = A and 


d(a[r,7']) = 4(qo, 0) 


A run o of A ont € Ts is a Q-labelled tree @ with dom(g) = {A} U {ub | u E€ dom(t),b € 
{0,1}} such that o(u) = qo, and (o(u0), o(u1)) € d(e(u), t(u)), for all u € dom(t). 

o is accepting if all leaves of o are labeled with states in F, i.e. o(u) € F, for all 
u E€ dom(g)\ dom(t). 

A tree t is recognized by A if there is an accepting run of A on t. 

T(A) denotes the set of trees that are recognized by A. 


Fortunately, it is easy to convert from non-deterministic top-down to non-deterministic 
bottom-up tree automata. 


> Proposition 15 ( [7, Theorem 1.6.1]). The class of languages accepted by top-down NFTAs 
is precisely the class of languages accepted by bottom-up NFTAs. Given a top-down (bottom- 
up) NFTA one can compute a bottom-up (top-down) NFTA in linear time in the number of 
edges and states of the input. 


Second, we use the observation of Klaedtke and Ruef [27, Lemma 17] to compute a 
context-free grammar with the same Parikh image. 


> Lemma 16 ( [27]). For any non-deterministic top-down tree automaton A one can compute 
in linear time a context-free grammar G 4 expressing the trees accepted by A as words obtained 
through the in-order traversal of the trees. As a consequence Parikh(A) = Parikh(G 4). 


Proof. Let A = (Q,T,6,q1,F') be a top-down tree automaton. We define a context-free 
grammar G = (V, £, R, S) that generates the words obtained by traversing the trees recognized 
by A in infix order as follows: 
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= V = Q is the set of non-terminal symbols. 
= » is the set of terminal symbols. 
= There are two kinds of derivation rules: 


= For each (q,q’) E€ 6(p, b), we have the rule p > qbq’. 
If (F x F)N6(q,b) #0 then we have the rule q > b. 


= © = qr, is the start symbol of G. 


It is immediate from the definition that Inorder(L(A)) = L(G) and that the size of 
the grammar is equal to that of the automaton. Since the Parikh image is invariant under 
permutation of the labels, it follows that Parikh(A) = Parikh(G.,). < 


The second key observation, by Verma, Seidl and Schwentick, is that the Parikh image of 
a context-free grammar can be described by a linear-sized existential Presburger arithmetic 
formula. 


> Lemma 17 ( [40]). Given a context-free grammar G, one can compute an existential 
Presburger formula ġa for the Parikh image of L(G) in linear time. 


In summary, 


> Lemma 18. The set Parikh(Ms(Lı,...,Ln)) is definable by an existential Presburger 
formula p of size O(|M|) where |M| is the number of symbols used to describe the automaton 
M. 


In the more general case, when propositional letters denote overlapping Venn regions, 

a partitioning argument is required. This is formalised in Theorem 19. First, we fix some 

notation. We set pg := Na s” where 8 € {0,1}*, pr := U pg where L is a propositional 
BEL 


formula and F is the propositional satisfaction relation that is true if and only if the 
assignment of the values in 6 to the free variables in L makes the formula L true. When 
using the interpretation of sets of the form (1), the formula defining the Venn region pg will 
be denoted by f(d) := Naa O(A). We write SUS, to denote the set S1 U S2 where we 
want to emphasise that Sı N S2 = Ø. Finally, we use the notation [n] := {1,...,n } to refer 
to the first n natural numbers. 

We give next the technical statement of the main theorem of this section. The reader 
should refer to the explanations following the statement for the intuition behind it. 


> Theorem 19. Formula (3) is equivalent to the formula 


S 


As €[m].o : [s] > [m].-341, ---, Bs € {0,1}*. N Id-0” (aA 


j=1 


Iki,- Se: ene Pi,- , Ps. 


s 
Plkis km) A N Pi C Pro AUR pr, = UPA 


i=1 
s s 
A IP= ko A Apea Pi #0 (4) 
i=1 i=l 
where o is an injection from {1,...,s} to {1,...,m}, p is the arithmetic expression in 
Lemma 18 and |- | denotes the cardinality of the argument set expression. 
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S3 


E Figure 3 A Venn diagram representing the situation discussed in Example 20. To deal with the 
overlapping regions, it is necessary to decide to which sets do the different indices belong to. In the 
Figure, the different parts are marked with colours. 


We start by observing that formula (4) has two parts. The first part corresponds to the 
subterm Aj=1 Ad.y*i(d) and falls within the existential theory of the elements in D, Tha (D). 
The second part corresponds to the remaining subterm and falls within the quantifier-free first- 
order theory of Boolean Algebra with Presburger arithmetic (QFBAPA) [30], which can be 
viewed as the monadic second order theory Th3.0"((N, C, ~~)) where ~ is the equicardinality 


relation between two sets. 


The second observation is that formula (4) is distilled from a non-deterministic de- 
cision procedure for the formulae of the shape (3). The existentially quantified variables 
8,0, 01,..., Bs are guessed by the procedure. These guessed values are then used by special- 
ised procedures for Tha (D) and Th¥.°"((N, C, ~)). For the convenience of the reader, we 
describe here what these values mean (this meaning follows from the proof of the theorem 
below). The value of s represents the number of Venn regions associated to the formulae 
Li,..., Lm that will be non-empty. ø indexes these non-empty regions. ĝ1,...,ßBs are 
elementary Venn regions contained in the non-empty ones. 

Observe that Theorem 19 refines the statement in Lemma 10: the satisfiability problem 
of SFAs is decomposed into the decision problem of the existential fragment of the theory of 
the input characters and the existential fragment of the monadic second-order theory of the 
indices. 

Finally, it remains to exemplify the situation in which the Venn regions overlap, which 
justifies the introduction of the partition variables P,,..., P, in formula (4). 


> Example 20. Consider the situation where S; A S2 and S2 A S3 are two propositional 
formulae labelling the transitions of the symbolic automaton. These formulae correspond 
to the Venn regions Sı N S2 and S2 N S3, which share the region S1 N S2 N S3. Given a 
model of S1, S2 and S3, how do we guarantee that the indices in the region S1 N S2 N S3 
are consistent with a run of the automaton? For instance, the automaton may require one 
element in Sı N S2 and another in S2 N S3. Placing a single index in Sı N S2 N S3 would 
satisfy the overall cardinality constraints, but not the fact that overall we need to have two 
elements. Trying to specify these restrictions in the general case would reduce to specifying 
an exponential number of cardinalities. 


We proceed next to the proof of the theorem. 


Proof of Theorem 19. =) If formula (3) is satisfiable, then there are sets S1,...,S,, a word 
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d and a tree table 7 satisfying 
k k 
\ Si = {n€ {0,1} | di(d) } Ar € M(Ly,..., Ls) A N Si = {n € {0,1}* | ti(n) } 
i=1 i=1 


Let 5 € M(L1,..., Ls) be the symbolic tree table corresponding to r (where the bar 
is used to distinguish it from variable s appearing in the formula (4)). We define k; := 
[slz s = | {i| ki #0 }|, o mapping the indices in [s] to the indices of the terms for which 
ki is non-zero and P; = {n € {0,1}* | s(n) = Lo }. It will be convenient to work out the 
following equalities: 


k k 
pu = U NS” = U {neto | Ate @ 


BEL: j=1 BEL; 
= {ne {0,1} | r(n) E Li} 


k k 
p= US = U yneP| A8 
j=l 


BEL; j=1 BEL; 


={deD| L,(d(d)) } (5) 


where L;(¢(d(n))) is the propositional formula obtained by substituting each set variable S; 
by the formulae ¢;(d(n)). We now deduce formula (4): 


- p(ky,...,km): from 5 € P(Ly,..., Lm), we have that 
(k1,..-,km) € Parikh(Ms(Li,...,L£m)) 


and therefore p(k1,..., km). 
- P; C Praa: since 5 corresponds to 7, for all n € N we have T(n) = 3(n) and the inclusion 
follows from the definition of P; and equation (5). 


|P;| = ko): since 


Pi =| {7 € {0,1}* 


3(n) = Loy } | = [ilz = Kota) 
- Each pair of sets P;, Pj with i < j is disjoint: 


PiN Pj= {n € {0,1}* | a(n) = Law } A {n € {0,1}* | s(n) = Log) } = 
= {ne {0,1}* | 3(n) = Lo) = Loy } = 0 


using that the letters L are chosen to be distinct and that ø is an injection (so o(7) 4 a(j)). 


pp, U...Uprz,, = PÙ... ÙP: since by definition 


s(n) = Loc) } Pr, = {n E€ {0,1}* | T(n) H Li} 


and by definition of ø it follows that the only letters that can appear ins are L,(1),..., Lo(s). 
Thus, we have pz, U... U prn = [1, |t|] = [1, |s|] = Pi... ÙP,. 

There exists 61,..., 8s € {0,1}*, such that Aj_, Ps, P; # Ø: note that P; A Ø by 
definition of ø. Thus, there must exist some 3; such that pg, P; # Ø. We pick any such 
Bi. 

- Ajai Jd. (d): follows from pg, N Pj # Ø and formula (5). 
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<) Conversely, if formula (4) is satisfiable, then there is an integer s € [n], an injection ø : 
[s] © [n], bit-strings 61,..., 8, € {0,1}*, integers k1,...,km and sets $1,...,9%,Pi,...,Ps 
satisfying 


AN adp” (d) A plki, km) A N Pi C Prso AUE pH = Uja P; 


j=l i=l 
A |Pil = Foy A N pe nP: #0 (6) 
i=l {=l 


From p(k1,...,kn) follows that there is a symbolic table 5 € Mgs(L1,..., Lm) such that 
|5|; = ki for each L; € { L1,..., Lm }. From formula (5) and 


pr, U... U pr, = PÙ... UP, A NP C Phoa ^ VAN IPi| = ko) 
i=l i=1 
follows that we can replace the formulae L; occurring in the symbolic table § by the 
bit-strings representing the elementary Venn regions to which the indices of the sets P; 
belong. Moreover, thanks to the condition AĴ- ps, P; 4 Ø follows that we can replace 
the letters L; by the bit-strings (;, defining T as T(n) = f5; ifn € P;. In this way, we 
obtain a table r € M(L1,..., Ls). We then define the corresponding word over D, thanks 
to the property Aj_, dd.¢*(d). Naming the witnesses of these formulae as d;, we define 
d(n) = fa; if n € P;. To conclude, note that: 


{n € {0,1}" | t(n) } = Upasie | aigaryPi = {n € {0, 1)" | dj (d(r)) } 


Thus, we have that formula (3) is satisfied by the set variables 


Sj := {n € {0,1}* | t;(n) } = {n € {0, 1} | d;(d(m)) } 


5 Quantifier-free Boolean Algebra with Presburger Arithmetic 


The arguments following the statement of Theorem 19 sketch a non-deterministic procedure 
for the satisfiability problem of symbolic finite automata, based on the existence of decision 
procedures for Tha» (D) and TRY” ((N, C, ~)). In this section, we recall the non-deterministic 
polynomial time decision procedure for Th3.°"((N,C, ~)}). As a consequence, we obtain 
Corollary 22 which situates the decision problem of symbolic finite automata in the classical 
complexity hierarchy. This section should also prepare the reader for the extension of these 
results, where the automaton can require linear arithmetic constraints on the cardinalities of 
the effective Boolean algebra. This extension is described in Section 6. 

Instead of working with Th4.0"((N, C, ~)) directly, we use the logic QFBAPA [30] which 
has the same expressive power [29, Section 2]. The syntax of QFBAPA is given in Figure 4. 
The meaning of the syntax is as follows. F presents the Boolean structure of the formula, A 
stands for the top-level constraints, B gives the Boolean restrictions and T the Presburger 
arithmetic terms. The operator dvd stands for the divisibility relation and U represents the 
universal set. The remaining interpretations are standard. 

The satisfiability problem of this logic is reducible to propositional satisfiability in 
polynomial time. Our proofs will rely on the method of [30], which we sketch briefly here. The 
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n=A|Fi AR|FAV FL|AF 

= Ba = B| By C By|T, =T)|T, < Ta |K dvd T 
:= x | |U | Bı U B2 | Bı N B2 | BS 
n=k|K|T,+T2|K-T||B 

.{ —2] —1]0]1]2]... 


eat ae 


E Figure 4 QFBAPA’s syntax 


basic argument to establish a NP complexity bound on the satisfiability problem of QFBAPA 

is based on a theorem by Eisenbrand and Shmonin [17], which in our context says that any 

element of an integer cone can be expressed in terms of a polynomial number of generators. 

Figure 5 gives a verifier for this basic version of the algorithm. The algorithm uses an 

auxiliary verifier Vp4 for the quantifier-free fragment of Presburger arithmetic. The key step 

is showing equisatisfiability between 2.(b) and 2.(c). If z1,..., £% are the variables occurring 
k 


in bo,..., bp then we write pg = () xê for B = (e1,..., ex) € {0,1}* where we define x! := x 
i=1 
and x° := U \ z as before. If we define [bj], as the evaluation of b; as a propositional formula 
2°-1 
with the assignment given in § and introduce variables 1g = |pg|, then |b;| = >, AATE 


so the restriction A |b;| = k; in 2.(b) becomes A 3) [b:],28, = ki which can be seen as a 
1=0 i=0 j 


linear combination in the set of vectors {([bo]g,,-- Tola.) .j € {0,. — 1}}, i.e. as 


2°-1 [bol a; 
5 : la; = ki 


OI" Tople, 


Eisenbrand-Shmonin’s result allows then to derive 2.(c) for N polynomial in |x|. In the other 
direction, it is sufficient to set lg, = 0 for j € {0,...,2° — 1} \ {t1,...,in}. Thus, we have: 


> Theorem 21 ( [30]). The satisfiability problem of QFBAPA is in NP. 
From Theorems 19 and 21, we obtain the following improvement of [37, Theorem 2]: 


> Corollary 22. Let Tha- (D) be the existential first-order theory of the formulae used in 
the transitions of the symbolic tree automaton M. If Th3«(D) € C for some CD NP then 
L(M)#0EC. 


Proof. The procedure non-deterministically guesses the value of the variables s,o,61,..., Bs 
and uses a decision procedure for Tha«(D) and a non-deterministic polynomial time decision 
procedure for QFBAPA to check the corresponding sub-formulae in (4). The correctness of 
the procedure follows from Theorem 19. < 


Observe that in typical examples, Tha- (D) € NP and thus, from Corollary 22 it follows 
that £(M) # Ø € NP. This partially explains the success in the automation of SFAs in SMT 
solvers, which rely on solvers for propositional satisfiability. 
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‘6 Decision Procedure for Non-Emptiness with Cardinalities 


We now consider a generalisation of the language of a symbolic tree automaton from Lemma 10 
with cardinality constraints on the effective Boolean algebra. Similar extensions for related 
models of automata are considered in the literature on data words [19]. 


> Definition 23. A symbolic tree automaton with cardinalities accepts a language of the 
form: 


T E€ M(Ly,...,Lm)- Ni S= {n € {0,1}* | %(n) } 
where F is a formula from QFBAPA. 


L(M) = fac p* 


F(S1,..-,5%) A NE Si = {n € {0,1}* | di(d(n)) }A | 


Thus, checking non-emptiness of the language of a symbolic tree automaton with cardin- 
alities reduces to checking whether the following formula is true: 


ISh,- a, SpF (S103 8k) 


k 
ad. N 5; = {n € {0,1}* | d:(d(n)) }A 


i=l 


k 
Ir € M(L1,---,Lm) A N Si = {n € {0,1}* | rin) } (7) 


< 
Il 
= 


To show that Theorem 19 and Corollary 22 stay true with linear arithmetic constraints 
on the cardinalities, we need to repeat part of the argument in Theorem 19 since if F 
denotes the newly introduced QFBAPA formula and G, H are the formulae shown equivalent 
in Theorem 19, then, from: 


ISi, ..., SkF (S1, ., Sk) A G(S1,..., Sz) 


[irs Sea] s [Si as Se S 
it does not follow that: 


ISi,- Sk-F(S1,. 3 Sk) A H(S1,..., Sk) 


Instead, the algorithm derives the cardinality constraints from each theory and then uses the 
sparsity of solutions over the satisfiable regions. In the proof, we use the notations [bi], 
and lg introduced in Section 5. 


> Theorem 24. Formula (7) is equivalent to: 


N 
AN < p(\F]),4s € [m].o : [s] > [m].51,...,8n € {0, 1}*. A Ad." (d) ^ 


Ihies k SaaS Poa Pa 
plki,. km) A \ P; C Ping MULE, SUPA 
i=1 
N |Pil = kog A UXape: = Uzi Pi A F(Si, Sk) (8) 
i=1 


where p is a polynomial and |F| is the number of symbols used to write F. 
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Proof. The proof is deferred to the appendix. < 


We can thus formulate the analogous to Corollary 22 in the case of finite symbolic 
automata with cardinalities. 


> Corollary 25. Let Th3.«(D) be the existential first-order theory of the formulae used in the 
transitions of a symbolic finite automaton with cardinality constraints. If Thax(D) € C for 
some CD NP then L(M) AVEC. 


Proof. As in Corollary 22. < 


Z Conclusion 


We have revisited the model of symbolic tree automata as it was introduced in [37]. We 
have obtained tight complexity bounds on their non-emptiness problem. Our methodology 
follows the Feferman-Vaught decomposition technique in that it reduces the non-emptiness 
problem of the automaton to the satisfiability problem of the existential first-order theory of 
the characters accepted by the automaton and the satisfiability problem of the existential 
monadic second-order theory of the indices. 

To combine these two distinct theories we use the ideas from the combination method 
through sets and cardinalities of Wies, Piskac and Kunčak [42] and the computation of an 
equivalent linear-sized existentially quantified Presburger arithmetic formula from the Parikh 
image of a regular tree language. The latter combines two observations. The first observation 
by Klaedtke and Rueß [27] connects this problem with the computation of the Parikh image 
of a context-free grammar. The second observation by Verma, Seidl and Schwentick [40] 
allows computing the Parikh image of a context-free grammar in terms of a linear-sized 
Presburger arithmetic formula. A crucial step in the proofs is a partitioning argument for the 
underlying Venn regions. We profit from the analysis in [30] to extend our arguments to the 
satisfiability problem of finite symbolic automata that consider linear arithmetic restrictions 
over the cardinalities of the Boolean algebra associated with the symbolic finite automaton. 

In future work, we plan to extend our methods to other variants of symbolic automata to 
which we believe similar techniques may be applicable. Another interesting research direction 
would be to consider extensions of the language that allow free variables in set interpretations 
of the form (1), which seems to have applications to various satisfiability problems. Recently, 
Hague et alii. have found some parallel with the results in this paper [22]. However, our main 
motivation was the application of the Feferman-Vaught decidability technique in the bounded 
complexity setting. In particular, this work shows the usefulness of taking the reduction 
sequence of the Feferman-Vaught theorem to be a partitioning sequence [18, Theorem 3.1]. 
This was not completely evident at first, and improves over the results in [43], by allowing 
the ordering relation to range over non-disjoint regions. A natural continuation of our work 
would be to find similar decompositions for different decision problems of interest for symbolic 
automata. 
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lA Verifier for QFBAPA 


On input (x, w): 


1. Interpret w as: 


in x. 
b. a certificate C for Vp, on input xv’ defined below. 


2. Transform «x into 2’ by: 


a. rewriting boolean expressions according to the rules: 


by = b2 > bı C b2 A ba C bi 
by C b2 + |b, N b| = 0 


b. introducing variables k; for cardinality expressions: 


Pp 
GA N |bil = ki 


i=0 


c. rewriting into: 


p 
Ga A la, 20a A E [b:ls; «4a, = ki 


J=i1,..51N 1=0 j=t1,...,0N 
f 
3. Run Vp, on (2’,C). 


4. Accept iff Vp4 accepts. 


a. a list of indices i1,...,in E {0,...,2° — 1} where e is the number of set variables 


where G is the resulting quantifier-free Presburger arithmetic formula. 


© Figure 5 Verifier for QFBAPA 
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Proof of Theorem 24. =>) If formula (7) is true, then there are sets S),...,S,, a finite tree 


d and a tree table 7 such that: 
k 
F(S1,...,Sk) A N S= {n E {0,1}* | Gi(d(m)) FA 
i=1 


k 
T E€ M(L1,...,Lm) A N Si = {ne {0,1} | re(n) } 


i=l 


(9) 


Thus, there exists a symbolic table 5 € Mg(In,...,£,) corresponding to 7. We define 


ki := [s 


LS = |{i]|ki#0}|, o maps the indices in [s] to the indices of the terms for 
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The Complexity of Checking Non-Emptiness in Symbolic Tree Automata 


which k; is non-zero and P; = { n € {0,1}* | 3(n) = Lo) }. As in Theorem 19, we have 
the equalities pz, = { n € {0,1}* | T(n) E Li }, pr: = { n € {0,1}* | Li(G(d)) } and we can 
show that the following formula holds: 


Plkis km) A A Pe Pro AUPE, = Via iA 

i=l 
N |Pil = koa) A F(S1, «+5 Se) (10) 
i=l 


We need to find a sparse model of (10). To achieve this, we follow the methodology in 
Theorem 21. This leads to a system of equations of the form: 


piel | [bols, C1 
Fcp GAA > Jagse 
j=0 [ov] a; Cp 


We remove those elementary Venn regions where lg = 0. This includes regions whose associ- 
ated formula in the interpreted Boolean algebra is unsatisfiable, and regions corresponding 
to tree table entries not occurring in T. This transformation gives a reduced set of indices R 
participating in the sum. 

Using Eisenbrand-Shmonin’s theorem, we have a polynomial (in the size of the original 


formula) family of Venn regions 61, ..., y and corresponding cardinalities Vass -< l3, which 
we can assume to be non-zero, such that 
[bolg, Cl 
Fler...) CpG A 5 a m (11) 
BE{ P1 PN IER \[bp] a, Cy 


The satisfiability of formula (11) implies the existence of sets of indices pọ satisfying the 
conditions derived in formula (10). However, it does not imply which explicit indices belong 
to these sets and which are the contents corresponding to each index. From the condition 


n sS 
p(k, sig kn) ^ \ Pi C Phew A UP-41Pt, = Rar ^ \ |P;| = ko(i) 
i=1 i=1 
follows that there is a symbolic tree table 3’ satisfying Ms(L1,..., Ln) with koq) letters L,(;) 
and that these letters are made concrete by entries in P’ for each i € {1,...,s}. We take 
the Venn regions 8 € {f1,..., 8n} such that P’ D pg and label the corresponding entries in 
3’ with 8. In this way, we obtain a corresponding concrete tree table 7’. This makes the 
indices in each Venn region concrete. To make the contents of the indices concrete, note that 
for each 8 € R, since lg Æ 0, the formula Jd.¢°(d) is true. In particular, this applies to each 
BE {61,..., BN}. Thus, we obtain witnesses d,,...,dy. We form a tree by replacing each 
letter 8 in 7’ by the corresponding value dg. Then we have that formula (8) holds too. 


<) If formula (8) is true, then there is N < p(|F|) where p is a polynomial, s € [m], 
B1,.-.-,8N E POT Wis cx ss km € N and sets Sises Sk Pi,+++,Ps such that 


N s 

AN 34.07 Apis km) A N Pi S Proa A URapr: = UPA 
j=1 i=1 

\\ |P;| = koi) A UN Pe; = ears A F(S,, sae Sk) 
i=1 


Rodrigo Raya 


From p(ki,...,kn) follows that there is a symbolic table 5 € M(1,..., Lm) such that 
L; = ki for each L; € {L1,..., Lm }. From formula (9) and 


|5 


S S 
PL U... U PLn = PÙ... ÙP, A VAN Fe PLogy ^ \ |P;| = ko(i) 
i=1 i=l 


follows that we can replace the formulae L; occurring in the symbolic table 5 by the bit-strings 
representing the elementary Venn regions to which the indices of the sets P; belong. Moreover, 
thanks to the condition UÑ ipp, = U;_,P;, it follows that we can replace the letters L; by 
the bit-strings @;. In this way, we obtain a table r € M(1,..., Lm). We then define the 
corresponding word over D, thanks to the property Na Ad.¢°' (d). To conclude, note that: 


{n € {0,17 | (n) } = Uti pagar y Pi = {n € 0, 1}" | z(d(n)) } 


Thus, we have that formula (7) is satisfied by the set variables 


Sj := { n € {0,1}* | y(n) } ={n € {0,1}* | d;(d(n)) } 
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